A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present $\textit{Stars}$: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.
translated by 谷歌翻译
Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.
translated by 谷歌翻译
The Age-of-Information (AoI) metric has been widely studied in the theoretical communication networks and queuing systems literature. However, experimental evaluation of its applicability to complex real-world time-sensitive systems is largely lacking. In this work, we develop, implement, and evaluate an AoI-based application layer middleware that enables the customization of WiFi networks to the needs of time-sensitive applications. By controlling the storage and flow of information in the underlying WiFi network, our middleware can: (i) prevent packet collisions; (ii) discard stale packets that are no longer useful; and (iii) dynamically prioritize the transmission of the most relevant information. To demonstrate the benefits of our middleware, we implement a mobility tracking application using a swarm of UAVs communicating with a central controller via WiFi. Our experimental results show that, when compared to WiFi-UDP/WiFi-TCP, the middleware can improve information freshness by a factor of 109x/48x and tracking accuracy by a factor of 4x/6x, respectively. Most importantly, our results also show that the performance gains of our approach increase as the system scales and/or the traffic load increases.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
知识蒸馏(KD)是压缩边缘设备深层分类模型的有效工具。但是,KD的表现受教师和学生网络之间较大容量差距的影响。最近的方法已诉诸KD的多个教师助手(TA)设置,该设置依次降低了教师模型的大小,以相对弥合这些模型之间的尺寸差距。本文提出了一种称为“知识蒸馏”课程专家选择的新技术,以有效地增强在容量差距问题下对紧凑型学生的学习。该技术建立在以下假设的基础上:学生网络应逐渐使用分层的教学课程来逐步指导,因为它可以从较低(较高的)容量教师网络中更好地学习(硬)数据样本。具体而言,我们的方法是一种基于TA的逐渐的KD技术,它每个输入图像选择单个教师,该课程是基于通过对图像进行分类的难度驱动的课程的。在这项工作中,我们凭经验验证了我们的假设,并对CIFAR-10,CIFAR-100,CINIC-10和Imagenet数据集进行了严格的实验,并在类似VGG的模型,Resnets和WideresNets架构上显示出提高的准确性。
translated by 谷歌翻译
在本文中,我们提出了一种基于补丁的体系结构,用于多标签分类问题,其中仅在数据集图像中观察到一个正面标签。我们的贡献是双重的。首先,我们根据注意机制介绍了一个轻斑架构。接下来,利用嵌入自相似性的补丁,我们提供了一种新颖的策略来估计负面示例并处理积极和未标记的学习问题。实验表明,我们的体系结构可以从头开始训练,而在文献中相关方法需要进行类似数据库的预培训。
translated by 谷歌翻译
基于BERT的微调模型在内存,计算和时间上是资源密集的。尽管许多先前的工作旨在通过压缩技术(例如修剪)提高推论效率,但这些作品并未明确解决培训对下游任务的计算挑战。我们介绍了学习者模块和启动,新颖的方法,以利用预训练的语言模型的过度参数化,以获得收敛速度和资源利用率的好处。学习者模块通过微调参数的微调来导航1)有效训练的双结合,以及2)通过确保快速收敛和高度度量得分有效训练。我们在Distilbert上的结果表明,学习者在与基础方面的表现或超过基线。学习者训练7倍的参数比胶水上的最新方法少。在可乐方面,学习者快速调整20%,并且资源利用率显着降低。
translated by 谷歌翻译
机器学习(ML)研究通常集中在模型上,而最突出的数据集已用于日常的ML任务,而不考虑这些数据集对基本问题的广度,困难和忠诚。忽略数据集的基本重要性已引起了重大问题,该问题涉及现实世界中的数据级联以及数据集驱动标准的模型质量饱和,并阻碍了研究的增长。为了解决此问题,我们提出Dataperf,这是用于评估ML数据集和数据集工作算法的基准软件包。我们打算启用“数据棘轮”,其中培训集将有助于评估相同问题的测试集,反之亦然。这种反馈驱动的策略将产生一个良性的循环,该循环将加速以数据为中心的AI。MLCommons协会将维护Dataperf。
translated by 谷歌翻译
目的:利用高分辨率定量CT(QCT)成像特征来预测间质肺疾病(ILD)的纤维纤维诊断和预后。方法:40名ILD患者(20例常规间质性肺炎(UIP),20个非UIP模式ILD)由2位放射科医生的专家共识分类,随后持续了7年。记录临床变量。分割肺场后,使用基于晶格的方法(TM模型)提取了总共26个纹理特征。将TM模型与先前基于直方图的模型(HM)进行了比较,以便将UIP与非UIP分类。为了进行预后评估,进行了生存分析,将专家诊断标签与TM指标进行比较。结果:在分类分析中,TM模型的表现优于HM方法,AUC为0.70。虽然在COX回归分析中,UIP与非UIP专家标签的生存曲线在统计学上并没有差异,但TM QCT特征允许该队列的统计学意义分区。结论:TM模型在区分非UIP模式方面优于HM模型。最重要的是,TM允许将队列分配为不同的生存群体,而专家UIP与非UIP标签则不得。 QCT TM模型可以改善ILD的诊断,并提供更准确的预后,更好地指导患者管理。
translated by 谷歌翻译
我们提出了一个基于深度学习的自动咳嗽分类器,可以区分结核病(TB)与Covid-19咳嗽和健康咳嗽。 TB和Covid-19都是呼吸道疾病,具有传染性,咳嗽是一种主要的症状,每年夺走了数千人的生命。在室内和室外设置都收集了咳嗽的录音,并使用来自全球各地受试者的智能手机上传,因此包含各种噪声。该咳嗽数据包括1.68小时的结核病咳嗽,18.54分钟的咳嗽,咳嗽和1.69小时的健康咳嗽,47例TB患者,229例Covid-19患者和1498例健康患者,并用于培训和评估CNN,LSTM和Resnet505050 。这三个深度体系结构在2.14小时的打喷嚏,2.91小时的语音和2.79小时的噪音中也进行了预训练,以提高性能。通过使用SMOTE数据平衡技术并使用诸如F1得分和AUC之类的性能指标来解决我们数据集中的类不平衡。我们的研究表明,从预先训练的RESNET50中获得了最高的0.9259和0.8631的F1分数,两级(TB与CoVID-19)和三级(TB VS VS COVID-19与健康)的咳嗽分类,咳嗽分类,,咳嗽分类任务,三级(TB vs vs covid-19)分别。深度转移学习的应用改善了分类器的性能,并使它们更加坚固,因为它们在交叉验证折叠上更好地概括了。他们的表现超过了世界卫生组织(WHO)设定的结核病分类测试要求。产生最佳性能的功能包含MFCC的高阶,这表明人耳朵无法感知结核病和COVID-19之间的差异。这种类型的咳嗽音频分类是非接触,具有成本效益的,并且可以轻松地部署在智能手机上,因此它可以成为TB和COVID-19筛查的绝佳工具。
translated by 谷歌翻译